Profiling

  • Provides your application with a mechanism to time the execution of commands on the GPU.

  • You can specify any pipeline stage at which the timestamp should be written, a lot of stage combinations and orderings won’t give meaningful result.

    • So while it may may sound reasonable to write timestamps for the vertex and fragment shader stage directly one after another, that will usually not return meaningful results due to how the GPU works.

  • You can’t compare timestamps taken on different queues.

  • Sample .

    • We’ll be using 6 time points, one for the start and one for the end of three render passes.

    • The code samples/api/timestamp_queries :

      • Uses QUERY_RESULT_64 | QUERY_RESULT_WAIT , so it's not optimal.

      • The query is made after vkQueueSubmit() .

  • GPU Timing Basics .

    • Vulkan and DX12.

    • Uses QUERY_RESULT_64  and enables the hostQueryReset  for vk.PhysicalDeviceVulkan12Features , using vk.ResetQueryPool()  right after creating the QueryPool .

  • Queries .

  • vkCmdWriteTimestamp2 .

    • This is pretty much the same as the vkCmdWriteTimestamp  function used in this sample, but adds support for some additional pipeline stages using VkPipelineStageFlags2 .

Support

  • Device limits:

    • timestampPeriod

      • If the limit of the physical device is greater than zero, timestamp queries are supported.

      • If your device has a timestampPeriod  of 1, so that one increment in the result maps to exactly one nanosecond.

      • It contains the number of nanoseconds it takes for a timestamp query value to be increased by 1 ("tick").

    • timestampComputeAndGraphics

      • If is TRUE , timestamps are supported by every queue family that supports either graphics or compute operations

      • If not, we need to check if the queue we want to use supports timestamps.

Query Pool

  • A query pool is then used to either directly fetch or copy over the results to the host.

  • Used to store and read back the results.

  • queryType

    • We set to QUERY_TYPE_TIMESTAMP  for using timestamp queries

  • queryCount

    • The maximum number of the the timestamp query result this pool can store.

Reset
  • Before we can start writing data to the query pool, we need to reset it.

  • vkCmdResetQueryPool

    • At the start of the command buffer.

    • Sets the status of query indices [ firstQuery , firstQuery  + queryCount  - 1] to unavailable.

    • Defines an execution dependency between other query commands that reference the same query.

  • vkResetQueryPool() .

  • QUERY_POOL_CREATE_RESET_KHR

    • During Query Pool creation.

Writing

  • vkCmdWriteTimestamp

    • Will request a timestamp to be written from the GPU for a certain pipeline stage and write that value to memory.

Reading

  • Reading back the results can be done in two ways:

    • Copy the results into a VkBuffer  inside the command buffer using vkCmdCopyQueryPoolResults

    • Get the results after the command buffer has finished executing using vkGetQueryPoolResults

  • vkGetQueryPoolResults()

    • QUERY_RESULT_64

      • Will tell the api that we want to get the results as 64 bit values. Without this flag, we would only get 32 bit values. And since timestamp queries can operate in nanoseconds, only using 32 bits could result into an overflow.

      • if your device has a timestampPeriod  of 1, so that one increment in the result maps to exactly one nanosecond, with 32 bit precision you’d run into such an overflow after only about 0.43 seconds.

    • QUERY_RESULT_WAIT

      • Tells the api to wait for all results to be available. So when using this flag the values written to our time_stamps  vector is guaranteed to be available after calling vkGetQueryPoolResults .

      • This is fine for our use-case where we want to immediately access the results, but may introduce unnecessary stalls in other scenarios.

    • QUERY_RESULT_WITH_AVAILABILITY

      • Will let you poll the availability of the results and defer writing new timestamps until the results are available.

      • This should be the preferred way of fetching the results in a real-world application. Using this flag an additional availability value is inserted after each query value. If that value becomes non-zero, the result is available. You then check availability before writing the timestamp again.

Occlusion Queries

  • Occlusion queries track the number of samples that pass the per-fragment tests for a set of drawing commands. As such, occlusion queries are only available on queue families supporting graphics operations. The application can  then use these results to inform future rendering decisions.

  • An occlusion query is begun and ended by calling vkCmdBeginQuery  and vkCmdEndQuery , respectively.

  • When an occlusion query begins, the count of passing samples always starts at zero.

  • For each drawing command, the count is incremented as described in Sample Counting . If flags  does not contain QUERY_CONTROL_PRECISE  an implementation may  generate any non-zero result value for the query if the count of passing samples is non-zero.

Pipeline Statistics Queries

  • Pipeline statistics queries allow the application to sample a specified set of VkPipeline  counters. These counters are accumulated by Vulkan for a set of either drawing or dispatching commands while a pipeline statistics query is active. As such, pipeline statistics queries are available on queue families supporting compute operations.

  • The availability of pipeline statistics queries is indicated by the pipelineStatisticsQuery  member of the VkPhysicalDeviceFeatures  object (see vkGetPhysicalDeviceFeatures  and vkCreateDevice  for detecting and requesting this query type on a VkDevice ).

  • A pipeline statistics query is begun and ended by calling vkCmdBeginQuery  and vkCmdEndQuery , respectively.

  • When a pipeline statistics query begins, all statistics counters are set to zero. While the query is active, the pipeline type determines which set of statistics are available, but these must  be configured on the query pool when it is created. If a statistic counter is issued on a command buffer that does not support the corresponding operation, or the counter corresponds to a shading stage which is missing from any of the pipelines used while the query is active, the value of that counter is undefined  after the query has been made available. At least one statistic counter relevant to the operations supported on the recording command buffer must  be enabled.

Performance Queries

  • Provide applications with a mechanism for getting performance counter information about the execution of command buffers, render passes, and commands. [asdasd]

  • Each queue family advertises the performance counters that can  be queried on a queue of that family via a call to vkEnumeratePhysicalDeviceQueueFamilyPerformanceQueryCountersKHR . Implementations may  limit access to performance counters based on platform requirements or only to specialized drivers for development purposes.

  • Performance queries use the existing vkCmdBeginQuery  and vkCmdEndQuery  to control what command buffers, render passes, or commands to get performance information for.

Mesh Shaders Queries

  • When a generated mesh primitives query is active, the mesh-primitives-generated count is incremented every time a primitive emitted from the mesh shader stage reaches the fragment shader stage. When a generated mesh primitives query begins, the mesh-primitives-generated count starts from zero.

  • Mesh and task shader pipeline statistics queries function the same way that invocation queries work for other shader stages, counting the number of times the respective shader stage has been run. When the statistics query begins, the invocation counters start from zero.

Result Status Queries

  • Result status queries serve a single purpose: allowing the application to determine whether a set of operations have completed successfully or not, as indicated by the VkQueryResultStatusKHR  value written when retrieving the result of a query using the QUERY_RESULT_WITH_STATUS_KHR  flag.

  • Unlike other query types, result status queries do not track or maintain any other data beyond the completion status, thus no other data is written when retrieving their results.

  • Support for result status queries is indicated by VkQueueFamilyQueryResultStatusPropertiesKHR :: queryResultStatusSupport  , as returned by vkGetPhysicalDeviceQueueFamilyProperties2  for the queue family in question.

Other Queries

  • Transform Feedback Queries.

  • Primitives Generated Queries.

  • Intel Performance Queries.

  • Video Encode Feedback Queries.